Continuous Distributed Stream Querying using Sketches1
نویسندگان
چکیده
While traditional database systems optimize for performance on one-shot query processing, emerging largescale monitoring applications require continuous tracking of complex data-analysis queries over collections of physically-distributed streams. Thus, effective solutions have to be simultaneously space/time efficient (at each remote monitor site), communication efficient (across the underlying communication network), and provide continuous, guaranteed-quality approximate query answers. In this paper, we propose novel algorithmic solutions for the problem of continuously tracking a broad class of complex aggregate queries in such a distributed-streams setting. Our tracking schemes maintain approximate query answers with provable error guarantees, while simultaneously optimizing the storage space and processing time at each remote site, and the communication cost across the network. In a nutshell, our algorithms rely on tracking general-purpose randomized sketch summaries of local streams at remote sites along with concise prediction models of local site behavior in order to produce highly communicationand space/time-efficient solutions. The end result is a powerful approximate query tracking framework that readily incorporates several complex analysis queries (including distributed join and multi-join aggregates, and approximate wavelet representations), thus giving the first known low-overhead tracking solution for such queries in the distributed-streams model. Experiments with real data validate our approach, revealing significant savings over naive solutions as well as our analytical worst-case guarantees.
منابع مشابه
JSQ: Distributed querying of JSON stream data
Nowadays, the necessity for online processing of data is becoming more evident. The most convenient way to perform analytical online processing is declaring continuous queries using special query languages. The goal of this work is to propose the system for distributed continuous query processing on clusters of commodity computers. We studied existing solutions and requirements for such systems...
متن کاملA Partial Evaluation Approach for Querying Data Streams and Distributed Fragmented Relations
Due to the significant communication overhead incurred in the presence of a rapidly updated stream, centralised approaches are becoming less realistic for large scale distributed processing, especially when complicated continuous queries such as window aggregation and joins between streams and relations are processed. This paper proposes techniques to evaluate stream queries in more generic dis...
متن کاملPlace: a Distribted Spatio-temporal Data Stream Management System for Moving Objects
Moving objects equipped with locating devices can report their locations periodically to data stream sewers. With the pervasiveness of moving objects, one single sewer cannot support all objects and queries in a wide area. As a result, multiple spatio-temporal data stream management systems must be deployed and thus result in a sewer network. It is vital for sewers in the network to collaborate...
متن کاملResource - Aware Ubiquitous Data Stream Querying
—This paper proposes and develops a novel, iterative model for resource aware-ubiquitous data stream querying (RA-UDSQ). Our model provides timely results to mobile users at regular time intervals specified by the user, thereby executing continuous stream queries. This model is capable of adapting to high data rates of streams and limited memory resources available on a mobile device while exec...
متن کاملStream-temporal Querying with Ontologies
Recent years have seen theoretical and practical efforts on temporalizing and streamifying ontology-based data access (OBDA). This paper contributes to the practical efforts with a description/evaluation of a prototype implementation for the stream-temporal query language framework STARQL. STARQL serves the needs for industrially motivated scenarios, providing the same interface for querying hi...
متن کامل